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We present bioinformatics applications and 
technologies created and used by Agilent LSCA 
Informatics. We review the current product line 
and briefly explain the architecture of our three- 
tiered system. We then present one example of 
an enterprise workflow: outsourcing GeneSpring 
GX computations to RemoteServers, thus freeing 
the local resources for other tasks. We describe 
the concept and go over the key problems we had 
to solve. Finally, we briefly introduce the Java 
and Web 2.0 technologies used by our product 
line. 


1. INTRODUCTION 


LSCA Informatics’ main product is a desk- 
top application called GeneSpring GX (GSGX). 
GSGxX is a visualization and analysis tool de- 
signed for use with gene expression data. We 
transformed GSGX code base into a more gen- 
eral framework — GeneSpring platform (GS plat- 
form) on which multiple bioinformatics prod- 
ucts, similar to GSGX, can be easily built. Sev- 
eral Agilent products based on the GS platform 
have been released. 
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Fig. 1. GeneSpring WorkgroupServer architecture 


Our GeneSpring WorkgroupServer web ap- 
plication (WGS, see Fig. 1.) stores the data from 
GeneSpring applications into a centralized data- 
base and enables collaborations among scientists 
on bigger company-wide research projects. It 
can also be used to outsource demanding compu- 
tations from GSGX to RemoteServers (headless 
GSGXs with XML interface) and to enforce 
FDA 21 CFR Part 11 compliance. 


2. OUTSOURCING COMPUTATIONS TO 
REMOTE SERVERS 


There are two ways of executing computa- 
tions in GSGX: locally (synchronously) and re- 
motely (asynchronously). The second way (“re- 
mote execution’) involves sending the computa- 
tion to one of RemoteServers through WGS that 
acts as a job scheduler and then, later, download- 
ing the results back to GSGX. 

The concept is analogous to the concept of 
computer users printing documents: the printing 
jobs are mediated by the print server connected 
to multiple printers. 

Remote execution has advantages of freeing 
up the computer for other work and not having to 
actively wait during the computation (some 
computations take hours or even days). In this 
“fire and forget” mode, user can send a computa- 
tion to WGS and then shutdown or log off. Next 
time the user logs into WGS through GSGX, a 
notification message about the finished computa- 
tions is displayed. The results are just one addi- 
tional click away. In order for this “magic” to 
happen, we had to solve several problems. 

1. Computation representation and exe- 
cution. All primitive GSGX computations are 
described by XML files. Each description in- 
cludes information about inputs, outputs, and the 
internal computation name. Primitive computa- 
tions can be combined visually into more com- 
plex computations (scripts) using ScriptEditor. 
A module called Script Engine executes scripts 
recursively until a primitive computation is 
reached, at which point the actual algorithm im- 
plementation in Java is executed and the results 
propagated accordingly. 

2. Data representation. Since inputs and 
outputs are first-class GSGX data objects 
(Genes, Gene Lists, Experiments...), each script, 
in order to be executable, has to embed serialized 
inputs to feed data to the Script Engine inside 


RemoteServer. One exception is the case of a 
data object that is stored remotely on WGS. In 
this case we just embed the pointer to it on 
WGS'. Once the computation is done, the out- 
puts are again serialized into XML and then de- 
serialized on demand later in GSGX. 
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Fig. 2. Outsourcing computations on Remote Servers 


3. Communication channel. Once a com- 
putation is wrapped in a script, the script gets 
wrapped in a job (also represented as XML), 
which then gets sent from GSGX to WGS. WGS 
receives the job and puts it into the (FIFO) job 
queue. The job gets sent to the first available 
RemoteServer. As soon as RemoteServer starts 
receiving the job, it puts itself into “busy” state. 
Once the job’s computation is done, Remote- 
Server sends the results to WGS and puts itself 
back into “idle” state. WGS receives and saves 
the results. User can monitor job execution from 
GSGX’s Remote Execution Queue window. 
Once a job is finished, user will be notified and 
the results can be downloaded and used. The 
whole process is depicted on Fig. 2. 

4. Easy remote execution framework. 
Each GSGX computation window presents two 
options to the user: “Compute Locally” and 
“Compute on a RemoteServer”. This plugin-like 
feature (Fig. 3.) is available to the application 
developers, provided that they implement certain 
interfaces and supply certain objects when 
needed. The framework frees the developers 
from worrying about remote execution and en- 
ables them to focus on the algorithms and com- 
putation implementations instead. 


' Note that in this case embedding the object into the input 
would lead to redundant network traffic between GSGX and 
WGS. This means that for repeated computations the total 
execution time of the whole batch is shorter if the input ob- 
jects are initially uploaded to WGS. 
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Fig. 3. Remote execution plugin-like framework GUI 


3. TECHNOLOGIES FOR APPLICATION 
DEVELOPMENT 


GeneSpring applications are developed in 
the Java programming language, which therefore 
enables us to deploy over any modern OS. 

The WGS data storage uses Oracle database. 
The middle tier and the front end use Web 2.0 
open source frameworks, which orchestrate 
communication among POJOs (Plain Old Java 
Objects). POJOs are populated by transparent 
persistence framework (Hibernate) displayed via 
GUI framework (Tapestry) and obtained from 
web container (Spring). The key concept is that 
the POJOs themselves are not aware of the big- 
ger picture, which makes them less coupled and 
better suited for unit testing. 


4. CONCLUSION 


We presented the framework for outsourcing 
computations to RemoteServers using WGS as a 
job scheduler. The framework, once imple- 
mented in the GS platform, will be inherited by 
any new platform based product. 

In the future, we hope to use distributed and 
64-bit computing and more of Web 2.0 tech- 
nologies to further enhance the performance of 
WGS and GS platform-based applications, thus 
enabling our customers to perform complex 
analysis of biological data. 
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